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Alistipes finegoldii Rautio ef al. 2003 is one of five species of Alistipes with a validly pub- 
lished name: family Rikenellaceae, order Bacteroidetes, class Bacteroidia, phylum 
Bacteroidetes. This rod-shaped and strictly anaerobic organism has been isolated mostly from 
human tissues. Here we describe the features of the type strain of this species, together with 
the complete genome sequence, and annotation. A. finegoldii is the first member of the genus 
Alistipes for which the complete genome sequence of its type strain is now available. The 
3,734,239 bp long single replicon genome with its 3,302 protein-coding and 68 RNA genes 
is part of the Genomic Encyclopedia of Bacteria and Archaea project. 



(AHN2437 1 ) 



Introduction 



Strain AHN2437T (= DSM 17242 = CCUG 46020 = 
JCM 16770] is the type strain of Alistipes finegoldii 
[1,2]. This strain is one of several strains with sim- 
ilar properties [3] that were isolated mainly from 
pediatric patients with inflamed, gangrenous or 
non-inflamed appendices [4,5]. Though the type 
strain AHN2437 1 resembled members of the 
Bacteroides fragilis group in bile-resistance and 
positive indole reaction, it was found, together 
with the type strain of Bacteroides putredinis, to 
form a separate phylogenetic lineage apart from 
authentic Bacteroides species [1]. The genus 
Alistipes was established to accommodate these 



two species and has subsequently been enlarged 
to encompass three additional species with validly 
published names and one with an effectively pub- 
lished name [6,7]. According to the position in 
The All-Species Living Tree' 16S rRNA gene se- 
quence dendrogram [8], the genus Alistipes is a 
sister clade of Rikenella microfusus, formerly 
Bacteroides microfusus [9,10], the two genera con- 
stituting the family Rikenellaceae [11,12]. Here we 
present a summary classification and a set of fea- 
tures for A. finegoldii AHN2437 T together with the 
description of the complete genomic sequencing 
and annotation. 



(cc 




The Genomic Standards Consortium 



Mavromatis ef al. 



Classification and features 
16S rDNA gene sequence analysis 

A representative genomic 16S rRNA gene se- 
quence of A. finegoldii AHN2437 T was compared 
using NCBI BLAST [13,14] under default settings 
(e.g., considering only the high-scoring segment 
pairs (HSPs] from the best 250 hits] with the most 
recent release of the Greengenes database [15] 
and the relative frequencies of taxa and keywords 
(reduced to their stem [16]] were determined, 
weighted by BLAST scores. The most frequently 
occurring genera were Alistipes (84.4%] and 
Bacteroides (15.6%] (19 hits in total]. Regarding 
the three hits to sequences from members of the 
species, the average identity within HSPs was 
98.7%, whereas the average coverage by HSPs 
was 98.0%. Regarding the nine hits to sequences 
from other members of the genus, the average 
identity within HSPs was 96.5%, whereas the av- 
erage coverage by HSPs was 100.1%. Among all 
other species, the one yielding the highest score 
was Alistipes shahii (AB554233], which corre- 
sponded to an identity of 97.2% and an HSP cov- 
erage of 100.0%. (Note that the Greengenes data- 
base uses the INSDC (= EMBL/NCBI/DDBJ] anno- 
tation, which is not an authoritative source for 
nomenclature or classification.] The highest- 
scoring environmental sequence was AY643083 
(Greengenes short name 'Isolation finegoldii blood 
two patients colon cancer Alistipes finegoldii; clone 
3'], which showed an identity of 100.0% and an 
HSP coverage of 99.4%. The most frequently oc- 
curring keywords within the labels of all environ- 
mental samples which yielded hits were 'human' 
(11.5%], 'fecal' (8.1%], 'intestin' (5.5%], 'biopsi' 
(4.2%) and 'mucos' (4.0%) (231 hits in total). The 
most frequently occurring keywords within the 
labels of those environmental samples which 
yielded hits of a higher score than the highest 
scoring species were 'finegoldii' (18.2%), 'alistip, 
blood, cancer, colon, isol, patient, two' (9.1%) and 
'fecal, human' (9.1%) (2 hits in total). These key- 
words are in accordance with the original isola- 
tion source of A. finegoldii. 

Figure 1 shows the phylogenetic neighborhood of 
A. finegoldii in a 16S rRNA gene based tree. The 
sequences of the two 16S rRNA gene copies in the 
genome differ from each other by ten nucleotides, 
and differ by up to ten nucleotides from the previ- 
ously published 16S rRNA gene sequence 
(AY643083). 



Morphology and physiology 

Most members of A. finegoldii were isolated on 
Bactero/des-bile-esculin (BBE) agar, others on 
kanamycin/vancomycin laked blood agar. Cells 
stain Gram-negative, and are non-spore forming 
and rod-shaped with rounded ends (0.2 x 0.8 to 2 
|im), mostly occurring singly, though longer fila- 
ments are observed occasionally (Figure 2). After 
4 days growth on Brucella sheep blood agar colo- 
nies are 0.3-1.0 mm in diameter, circular, gray, 
translucent or opaque and weakly (B-hemolytic. On 
laked rabbit blood agar colonies are light brown 
after 4 days incubation, turning reddish or choco- 
late brown after 10 days [1,3]. Growth tempera- 
ture is 37°C [31]. The organism is strictly anaero- 
bic, indole-positive, catalase-negative and grows 
in peptone-yeast extract-glucose containing 20% 
bile [1,3]. Nitrate is not reduced to nitrite, gelatin 
is liquefied and esculin hydrolysis is negative. Me- 
tabolism is fermentative, however, due to scanty 
growth on agar media and in liquid media, carbo- 
hydrate metabolism is difficult to evaluate. In PYG 
broth, succinic acid is the major end product, 
while acetic and propionic acids are minor prod- 
ucts; isovaleric and lactic acids are sometimes 
produced in very small amounts. Acid- and alka- 
line phosphatases, N-acetyl-(B-glucosaminidase, 
esterase, esterase lipase, a- and (B-galactosidases, 
and a-glucosidase are detected in the API ZYM 
(bioMerieux) gallery, while no activity is detected 
for lipase C4, leucine/valine/cystine arylamidases, 
trypsin, (B-glucuronidase, (B-glucosidase or a- 
mannosidase. In addition, using Rosco diagnostic 
tablets (Rosco, Taastrup, Denmark), a-fucosidase 
is detected, but not (B-xylosidase or trypsin. Strains 
are resistant to vancomycin (5 |ig), kanamycin 
(1,000 [ig], and colistin (10 |ig). Susceptibility to 
penicillin varies and some strains produce (B- 
lactamase (reaction for the type strain has not 
been specified) [1,3]. 

Strain AHN2437 1 was isolated from a human 
appendiceal tissue sample. The habitat is not 
known but strains are probably members of the 
microflora of the human gut [1]. A. finegoldii-type 
organisms were identified by molecular methods 
as part of the microbiota of chicken guts [32] and 
they were detected in blood cultures from colon 
cancer patients [33]. 
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Chemotaxonomy 

The major cellular fatty acid of strain AHN2437 1 is 
/so-Ci5:o; smaller amounts (with 5 to 10% occur- 
rence] are anteiso-C 15 : 0 , Ci 5:0 , Ci 6: o, iso-Cn.o, and 
one or both of Ci 7:0 iso-30H/C 182 DMA. The mol% 
G+C of DNA is 57 [1,3]. No information is available 
for the peptidoglycan composition, isoprenoid 
composition, polar lipids or whole cell sugars. 

Genome sequencing and annotation 

Genome project history 

This organism was selected for sequencing on the 
basis of its phylogenetic position [34], and is part 
of the Genomic Encyclopedia of Bacteria and 
Archaea project [35]. The genome project is de- 
posited in the Genomes OnLine Database [23] and 
the complete genome sequence is deposited in 



GenBank. Sequencing, finishing and annotation 
were performed by the DOE Joint Genome Insti- 
tute (JGI) using state of the art sequencing tech- 
nology [46]. A summary of the project information 
is shown in Table 2. 

Growth conditions and DNA isolation 

A. finegoldii strain AHN2437 T , DSM 17242, was 
grown anaerobically in DSMZ medium 104 (PYG, 
supplemented with vitamin solution (see DSMZ 
medium 131]] [36] at 37°C. DNA was isolated 
from 1-1.5 g of cell paste using MasterPure Gram- 
positive DNA purification kit (Epicentre 
MGP04100] following the standard protocol as 
recommended by the manufacturer with modifica- 
tion st/LALM for cell lysis as described in Wu et al. 
2009 [35]. DNA is available through the DNA Bank 
Network [37]. 



Alistipes finegoldii (IMG2509822740) ** 



99/99 



99/100 



-Alistipes onderdonkii (AY974071) 



-Alistipes shahii (AY974072) 



-Alistipes putredinis (L16497) 



Alistipes indistinctus (AB490804) * 



Rikenella microfusus (L16498) * 



Figure 1. Phylogenetic tree highlighting the position of A. finegoldii relative to the type strains of the other species within the 
family Rikenellaceae. The tree was inferred from 1 ,432 aligned characters [1 7,1 8] of the 1 6S rRNA gene sequence under the 
maximum likelihood (ML) criterion [19]. Rooting was done initially using the midpoint method [20] and then checked for its 
agreement with the current classification (Table 1 ). The branches are scaled in terms of the expected number of substitutions 
per site. Numbers adjacent to the branches are support values from 1,000 ML bootstrap replicates [21] (left) and from 1,000 
maximum-parsimony bootstrap replicates [22] (right) if larger than 60%. Lineages with type strain genome sequencing pro- 
jects registered in GOLD [23] are labeled with one asterisk, those also listed as 'Complete and Published' with two asterisks. 
See also the species the not yet validly published names described together with their genome sequences in [6]. 
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Figure 2. Scanning electron micrograph of A. finegoldii AHN2437 1 



Genome sequencing and assembly 

The genome was sequenced using a combination 
of Illumina and 454 sequencing platforms. All 
general aspects of library construction and se- 
quencing can be found at the JGI website [38]. 
Pyrosequencing reads were assembled using the 
Newbler assembler (Roche]. The initial Newbler 
assembly consisting of 103 contigs in four scaf- 
folds was converted into a phrap [39] assembly by 
making fake reads from the consensus, to collect 
the read pairs in the 454 paired end library. 
Illumina GAii sequencing data (500.5 Mb] was as- 
sembled with Velvet [40] and the consensus se- 
quences were shredded into 2.0 kb overlapped 
fake reads and assembled together with the 454 
data. The 454 draft assembly was based on 160.8 
Mb 454 draft data and all of the 454 paired end 
data. Newbler parameters are -consed -a 50 -1 350 
-g -m -ml 20. The Phred/Phrap/Consed software 
package [39] was used for sequence assembly and 
quality assessment in the subsequent finishing 
process. After the shotgun stage, reads were as- 
sembled with parallel phrap (High Performance 
Software, LLC]. Possible mis-assemblies were cor- 
rected with gapResolution [38], Dupfinisher [41], 
or sequencing cloned bridging PCR fragments with 
subcloning. Gaps between contigs were closed by 
editing in Consed, by PCR and by Bubble PCR pri- 



mer walks (J.-F. Chang, unpublished]. A total of 
696 additional reactions and 2 shatter libraries 
were necessary to close gaps and to raise the qual- 
ity of the finished sequence. Illumina reads were 
also used to correct potential base errors and in- 
crease consensus quality using a software Polisher 
developed at JGI [42]. The error rate of the com- 
pleted genome sequence is less than 1 in 100,000. 
Together, the combination of the Illumina and 454 
sequencing platforms provided 161.1 x coverage 
of the genome. The final assembly contained 
324,940 pyrosequence and 13,793,104 Illumina 
reads. 

Genome annotation 

Genes were identified using Prodigal [43] as part 
of the DOE-JGI genome annotation pipeline [47], 
followed by a round of manual curation using the 
JGI GenePRIMP pipeline [44]. The predicted CDSs 
were translated and used to search the National 
Center for Biotechnology Information (NCBI] non- 
redundant database, UniProt, TIGR-Fam, Pfam, 
PRIAM, KEGG, COG, and InterPro databases. Addi- 
tional gene prediction analysis and functional an- 
notation was performed within the Integrated Mi- 
crobial Genomes - Expert Review (IMG-ER] plat- 
form [45]. 
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Genome properties 

The genome statistics are provided in Table 3 and 
Figure 3. The genome consists of one circular 
chromosome with a total length of 3,734,239 bp 
and a G+C content of 56.6%. Of the 3,302 genes 
predicted, 3,234 were protein-coding genes, and 



68 RNAs; 121 pseudogenes were also identified. 
The majority of the protein-coding genes (62.0%] 
were assigned a putative function while the re- 
maining ones were annotated as hypothetical pro- 
teins. The distribution of genes into COGs func- 
tional categories is presented in Table 4. 



Table 1. Classification and general features of A. finegoldii AHN2437 7 according to the MIGS recommendations [24]. 



MIGS ID 


Property 


Term 


Evidence code 






Domain Bacteria 


TAS [25] 






Phylum Bacteroidetes 


TAS [12,26] 






Class Bacteroidia 


TAS [12,27] 




Current classification 


Order Bacteroidales 


TAS [12,28] 






Family Rikenellaceae 


TAS [11,12] 






Genus Alistipes 


TAS [1,2] 






Species Alistipes finegoldii 


TAS [1,2] 


MIGS-12 


Reference for biomaterial 


Rautio et a/., 2003 


TAS [1] 


MIGS-7 


Subspecific genetic lineage (strain) 


AHN2437' 


TAS [1] 




Gram stain 


negative 


TAS [1] 




Cell shape 


rod-shaped 


TAS [1] 




Motility 


non-motile 


TAS [1] 




Sporulation 


non-sporulating 


TAS [1] 




Temperature range 


mesophile 


TAS [1] 




Optimum temperature 


37°C 


TAS [1] 




Salinity 


not reported 




MIGS-22 


Relationship to oxygen 


strictly anaerobe 


TAS [1] 




Carbon source 


not reported 






Energy metabolism 


chemoorganotroph 


TAS [1] 


MIGS-6 


Habitat 


probably human gut 


TAS [1] 


MIGS-6.2 


pH 


not reported 




MIGS-15 


Biotic relationship 


unknown 




MIGS-14 


Known pathogenicity 


none 


TAS [1] 


MIGS-16 


Specific host 


Homo sapiens 


TAS [1] 


MIGS-18 


Health status of Host 


unknown 






Biosafety level 


1 


TAS [29] 


MIGS-19 


Trophic level 


unknown 




MIGS-23.1 


Isolation 


human appendix tissue 


TAS [1] 


MIGS-4 


Geographic location 


Helsinki, Finland 


TAS [1] 


MIGS-5 


Time of sample collection 


1988 


NAS 


MIGS-4.1 


Latitude 


not reported 




MIGS-4.2 


Longitude 


not reported 




MIGS-4.3 


Depth 


not reported 




MIGS-4.4 


Altitude 


not reported 





Evidence codes: TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable 
Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted 
property for the species, or anecdotal evidence). Evidence codes are from the Gene Ontology project [30]. 
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Table 2. Genome sequencing project information 



MIGS ID 


Property 


Term 


MIGS-31 

I v 1 1 v_J j i 


Fini^hinQ niialifv 

I 1 1 1 1 o I 1 1 I I l; LlUdllLy 


FintQhfir! 

1 1 1 r 1 J> 1 r V . A 1 


MIGS-28 


Libraries used 


Three genomic libraries: one 454 pyrosequence standard library, one 

A ^ A PI- innn/ l\ \ (1 \sr~\ in cart ci7Ql /Ann 1 1 1 ■ i m im inrin/ 
HDH rC llUidiy ^ 1 1 .U KIJ lllbclL blXc// Ullt: IllUlilllld llUldiy 


MIGS-29 


Sequencing platforms 


lllumina GAii, 454 GS FLX Titanium 


MIGS-31.2 


Sequencing coverage 


133.3 x lllumina; 27.8 x pyrosequence 


MIGS-30 


Assemblers 


Newbler version 2.3, Velvet version 1 .0.1 3, Phrap version SPS - 4.24 


MIGS-32 


Gene calling method 


Prodigal 1.4, GenePRIMP 




INSDC ID 


CP003274 




GenBank Date of Release 


June 8, 2012 




GOLD ID 


Gc02257 




NCBI project ID 


440775 




Database: IMG-GEBA 


2509601035 


M1GS-13 


Source material identifier 


DSM 1 7242 




Project relevance 


Tree of Life, GEBA 



Table 3. Genome Statistics 


Attribute 


Value 


% of Total 


Genome size (bp) 


3,734,239 


100.00 


DNA coding region (bp) 


3,244,847 


86.89 


DNA G+C content (bp) 


2,115,287 


56.65 


Number of replicons 


1 




Extrachromosomal elements 


0 




Total genes 


3,302 


100.00 


RNA genes 


68 


2.06 


rRNA operons 


2 




tRNA genes 


52 


1.57 


Protein-coding genes 


3,234 


97.94 


Pseudo genes 


121 


3.66 


Genes with function prediction 


2,046 


61.96 


Genes in paralog clusters 


1,627 


49.27 


Genes assigned to COGs 


1,974 


59.78 


Genes assigned Pfam domains 


2,183 


66.11 


Genes with signal peptides 


967 


29.29 


Genes with transmembrane helices 


642 


19.44 


CRISPR repeats 


0 
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Figure 3. Graphical map of the chromosome. From outside to the center: Genes on forward strand (color by COG 
categories), Genes on reverse strand (color by COG categories), RNA genes (tRNAs green, rRNAs red, other RNAs 
black), GC content, GC skew (purple/olive). 
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Table 4. Number of genes 


associated with the general COG functional categories 


Code 


Value 


%age 


Description 


J 


144 


6.8 


T" I l' "1 III 11" 

Translation, nbosomal structure and biogenesis 


A 






RNA processing and modification 


K 


140 


6.6 


Transcription 


L 


214 


10.0 


Replication, recombination and repair 


B 






Chromatin structure and dynamics 


D 


36 


1.7 


Cell cycle control, cell division, chromosome partitioning 


Y 






Nuclear structure 


V 


40 


1.9 


Defense mechanisms 


T 


81 


3.8 


Signal transduction mechanisms 


M 


171 


8.0 


Cell wall/membrane biogenesis 


N 


7 


0.3 


Cell motility 


Z 






Cytoskeleton 


w 






Extracellular structures 


u 


56 


2.6 


Intracellular trafficking and secretion, and vesicular transport 


o 


77 


3.6 


Posttranslational modification, protein turnover, chaperones 


c 


127 


6.0 


Energy production and conversion 


G 


165 


7.7 


1 | | 4 4 I all" 

Carbohydrate transport and metabolism 


E 


141 


6.6 


Amino acid transport and metabolism 


F 


56 


2.6 


k.i r , ■ 1 . . 1 , | 1 - 

Nucleotide transport and metabolism 


H 


92 


4.3 


Coenzyme transport and metabolism 


I 


55 


2.6 


Lipid transport and metabolism 


P 


113 


5.3 


Inorganic ion transport and metabolism 


o 


20 


0.9 


Secondary metabolites biosynthesis, transport and catabolism 


R 


259 


12.2 


General function prediction only 


S 


137 


6.4 


Function unknown 




1,328 


40.2 


Not in COGs 
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